What's New with Sintelix Extension 2.0.22!
Welcome to Sintelix Extension V2.0.22
Along with the release of Sintelix 7.8, the Sintelix Extension has been updated improve the Harvesting experience and the ability to create powerful Harvesting Rule Sets.
New Features
Resize the Rule Set Panes
You can resize the 3 panes, by hovering over the pane separators, clicking and dragging to resize.
You can also select the collapse icon to collapse the Configurations, Harvester Rule Sets, and Collection Evaluation panes.
Plain Full Page document view
When viewing the Full Document, you can now select a Plain Doc view, stripped of all styles and CSS classes. This can, in some circumstances, provide better visibility of elements and an easier way to select elements.
Rule Sorting
The position of the rules in a Rule Set Configuration is actually quite important, rules at the bottom will overwrite those at the top if the Rule Paths overlap.
You can now drag and drop rules from Rules table to rearrange the order to your liking or need.
Duplicate Rule
To copy a rule, select the copy icon next to the rule.
Result: The copied rule will be added just below the current rule, and will have the same name.
This can be useful when you want to:
-
test alternative settings on a rule while keeping a backup of the original rule
-
create rules that are similar with slight variations in tags or classes
-
create a negative rule to work with the copied rule.
Edit Rule path
You can now edit a Tag in the Rule path:
-
Select the
next to each tag to insert a new tag, by default the DIV tag is added.
-
Double click on a tag and change the tag type -
.
This capability can be useful to customise the Rule Path.
Class Selector
You can create and apply your own classes to a Rule Path tag:
-
select the magnifying glass icon
-
select Create Custom Class option
at the bottom of the dropdown list.
-
enter a custom class name in the field displayed
-
select
to add the class (or
to cancel).
Pseudo Class Filter
You can apply a filter to a ruleset that allows Sintelix to select an element in a group based on its position.
For example, this is quite useful for selecting the last and first links of a list to “enable infinite scrolling” by harvesting the next and previous pages links.
You can choose to select the:
-
first: only selects the first element of a group
-
last : only selects the last element of a group
Filter IMG by Size
This option is only displayed when the last tag in the Rule Path is an
tag.You can set an maximum size of an IMG.
The filter can be used in combination with NEG rules to exclude IMGs of a certain size (like 1x1 placeholders).
Attribute Extraction
You can extract HTML attributes from the selected elements.
Once the option is selected, you can choose to:
-
display the attributes before or after the selected element.
-
display the attribute "Name & Value" or just the "Value"
-
replace the selected element by its attribute list.
Improvements and Fixes
Responsive IMG extraction
There are a couple of websites that use responsive images by setting the “srcset” HTML attribute instead of the “src” attribute.
When this technique was used, Sintelix was failing to get an image.
Sintelix now retrieves IMGs to set the “src” of responsive images to the biggest one available in the “srcset” attribute.
Other Fixes
-
Fixed issue with selecting items in the Errors tab table.
-
Fixed issue with Advanced options not cleanly collapsing.
-
When viewing a long rule path, scrolling has been added to ensure action buttons remain visible.
-
Fixed issue with text boxes not resizing within the limits of the dialog edges.
-
Fixed issue with pre-click elements not showing up in the preview.
-
Fixed issue with images and other elements displaying outside the Full Page document pane.
Help Updates
The Help document for the Sintelix Extension has had significant updates. The two principle areas:
-
User Guide > Harvester > Harvest via Sintelix Extension
-
Configurations Guide > Harvester Rule Sets